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1.  Introduction. 


Let  Xj,  X2>  . ..,  Xn  be  independent  and  identically  distributed  (i.i.d.) 
random  variables  with  distribution  function  (d.f.)  P(x)  =Pr(XSx).  We  consider 
the  randomly  right-censored  model  where  the  value  of  the  random  variable  X  is 
sometimes  unobservable.  Associated  with  each  X^  is  a  variable  independent  of 
These  Y^'s  are  i.i.d.  with  d.f.  Q(x) .  The  observations  consist  of  the  pairs 
(Z^,  6^) ,  i  =  1 ,  2,  ....  n,  where  Z^=min(X^f  Y^) ,  6^  =  I(X<Y),  and  1(A)  is  the 
indicator  function  of  the  set  A.  Typically  the  concern  of  the  statistician  is 
how  to  best  make  use  of  the  Z  's  and  i's  to  estimate  F  cr  some  functional  of  F. 

Given  that  this  censoring  is  to  take  place  another  question  arises.  Suppose 
more  than  one  censoring  variable  is  available  and  the  experimenter  is  given  his 
choice  as  to  which  to  use.  Which  variable  should  he  choose?  One  approach  is  to 
choose  the  censoring  variable  which  provides  the  greatest  "information".  Thus 
we  seek  general  ways  to  measure  information  in  censored  models. 

What  properties  should  information  measures  possess?  It  is  reasonable  to 
expect  that  it  is  better  to  observe  an  X  than  a  Y.  Furthermore,  stochastically 
increasing  Y  should  increase  information.  Thus  we  consider  the  following  two 
requirements  for  information  measures. 

If  YjS-^Y2,  the  information  in  (Z^,  6^)  is  less  than  the 
information  in  (Z2,  62)  where  Z^=min(X,  Y^)  and  S.  =  I(XsY.),  (1.1) 
i *  1,  2,  for  every  X. 

For  every  X  and  Y  the  information  in  X  is  greater  than 


the  information  in  (Z,  6),  Z  =  min(X,  Y) ,  6  =  I (X  s  Y)  . 


(1.2) 
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In  general  (1.2)  will  follow  from  (1.1)  by  taking  Y2 = ®  so  that  Z 2  has  the 
same  distribution  as  X.  The  adequacy  of  all  information  measures  considered 
here  will  be  with  regard  to  (1.1)  and  (1.2).  If  (1.1)  or  (1.2)  fail  to  hold, 
the  measure  is  inadequate.  Note  that  it  is  the  monotonicity  of  the  measure  that 
is  of  interest  as  the  measure  can  be  made  to  increase  or  decrease  by  simply 
changing  the  sign  in  its  definition. 

In  Section  2  notions  of  bivariate  dependence  are  used  to  measure  informa¬ 
tion.  Models  in  which  Y  is  increased  stochastically  should  generally  lead  to 
increased  dependence  of  X  and  Z.  Thus  measures  of  dependence  provide  a  natural 
framework  for  studying  information  in  the  censored  model . 

Various  notions  of  bivariate  dependence  are  considered  as  candidates  for 
measures  of  information.  These  include  positive  quadrant  dependence  (PQD) , 
association,  left-tail  decreasing  (LTD),  right-tail  increasing  (RTI) ,  and  sto¬ 
chastically  increasing  (SI).  Each  of  these  is  a  notion  of  positive  dependence 
which  requires  a  certain  probability  that  the  variables  are  in  some  quadrant  to 
be  positive.  These  notions  are  extended  to  notions  of  "more  positive  quadrant 
dependent,"  "more  associated,"  etc.,  by  requiring  that  this  probability  be 
increasing.  Then  these  new  notions  for  increased  positive  dependence  are  con¬ 
sidered  for  the  role  of  measures  of  information.  With  properties  (1.1)  and  (1.2) 
as  criteria,  it  is  shown  that,  with  the  exception  of  association,  all  of  these 
notions  of  bivariate  dependence  are  satisfactory. 

In  Section  3  the  relationship  between  X  and  Z  is  explored  through  their 
related  probability  functions.  Since  Z  is  equal  to  X  more  often  as  censoring 
decreases  it  should  be  true  that  the  probabilistic  structure  of  Z  should  approach 
that  of  X  as  censoring  decreases.  One  way  to  measure  closeness  of  probability 
distributions  is  by  coefficients  of  divcigenco  flcncrnl  rIaK-os  of  thosp  mpacurcj 


have  been  proposed  independently  by  Csisz&r  (1963,  1967),  Ali  and  Silvey  (1965a, 
1965b,  1966),  and  Ziv  and  Zakai  (1973). 

We  use  the  following  conventions:  For  a  function  f, 

f(0)  =  lir'f(x) 
x  ->0 

o  *  f  (|)  =  0  (1.3) 

0  •  f(^)  =  lim  a^p-,  a  >  0. 
x 

Let  f(x)  be  a  convex  function.  Let  a(x)  and  B(x)  be  nonnegative  measurable 
functions  on  some  measure  space  (X,  A,  P) .  Then  the  coefficient  of  divergence 
for  a(x)  and  B(x)  is  defined  by 

If(a,  B)  =/xP(x)  f  {|f^-}dP(x).  (1.4) 

For  probability  density  functions  p^x)  and  p2(x),  both  absolutely  continuous 
with  respect  to  some  measure  X,  (1.4)  becomes 

,  fP2(x)  ) 

Iffpr  P2)  =  Jp^*)  f  A(dx)  (1.5) 

This  is  the  measure  introduced  by  CsiszHr.  Ali  and  Silvey  use  a  slightly 
different  version  defined  by: 

IfU)  =EMf(4»))  =  J  f(6)dP1  +  P2(N)  lim^-  (1.6) 

where  4>  is  the  generalized  Radon-Nikodym  derivative  of  with  respect  to  P^  and 
N  is  a  Pj-null  set  where  P^  has  positive  measure.  Note  that  if  p^  and  p2  are 
mutually  absolutely  continuous,  then  (1.5)  and  (1.6)  are  identical. 

For  the  censored  model  we  need  to  find  a  satisfactory  way  to  define  p  and 
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P2  in  terms  of  X  and  Z.  These  must  be  designed  with  (1.1)  and  (1.2)  in  mind.  In 
Theorem  3.2  we  show  that  if  p^  and  p2  arc  taken  to  be  the  survival  distributions 
of  X  and  Z  respectively,  (1.1)  and  (1.2)  are  satisfied.  It  would  seem  more 
natural  to  let  p^  and  p2  be  the  respective  densities  of  X  and  Z  but  Example  3.3 
shows  that  this  is  unsatisfactory.  However,  if  in  (1.5)  p2  is  taken  to  be  the 
joint  density  of  X  and  the  vector  Z_  =  (Z,  <S) ,  and  p^^  is  taken  to  be  the  product 
of  the  X  and  Z^  marginals,  then  (1.1)  holds  with  some  restrictions  on  the  convex 
function  f(x),  and  the  density  function  of  X,  p(x) .  Property  (1.2)  holds  without 
any  restrictions . 


2 .  Measures  of  Bivariate  Dependence. 

Dependence  measures  have  typically  been  developed  to  test  for  independence 
between  two  variables  or  to  measure  the  degree  to  which  large  values  of  one 
variable  go  with  large  values  of  the  other.  Some  general  notions  of  dependence 
are  given  in  the  following  definition. 

Definition  2.1.  Given  two  random  variables  U  and  V  we  say  that  U  and  V  are: 

1)  Positively  quadrant  dependent  (PQD)  if  Pr(U^u,  VSv)  J 

(2.1) 

Pr(USu)Pr(Vsv)  for  all  u,  v. 


2)  Associated  if  Cov  {T(U,  V),  A(U,  V)}^0,  for  all  r,  A 
which  are  componentwise  increasing. 


(2.2) 


3)  Left-Tail  Decreasing  (LTD(V|U))  if  Pr(V  < v)U  £ u)  is 
decreasing  in  u. 


(2.3) 


4)  Right-tail  Increasing  (RTI(V|U))  if  Pr(V>v|ll>u)  is 
increasing  in  u. 


(2.4) 


(2.5) 
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5)  Stochastically  Increasing  (SI(V|U))  if  Pr(V>v|u  =  u)  is 

increasing  in  u. 

These  notions  are  ordered  in  strength  by: 

SI(V|U)  *»  RTI (v|u)  =*>  Association  PQD.  (2.6) 

The  sequence  of  implications  is  the  same  when  RTI(V|U)  is  replaced  by 
LTD(V|U).  For  verification  of  the  implications  and  counterexamples  to  the 
reverse  implications,  see  Barlow  and  Proschan  (1975).  Most  of  the  above  defini¬ 
tions  were  originally  given  in  Lehmann  (1966).  The  notion  of  association  was 
introduced  in  Esary,  Proschan,  and  Walkup  (1967) . 

The  inequalities  in  (2.1)  -  (2.5)  are  notions  of  positive  dependence  for  a 
pair  of  variables.  Next  we  compare  the  dependences  of  two  sets  of  variables, 
specifically,  between  the  variables  X  and  Z^  and  X  and  Z 2  where  Z^  =  min(X,  Y^) , 
i*  1,  2.  For  this  a  slight  generalization  of  Definition  2.1  is  needed. 

Definition  2.2.  Given  four  random  variables  U^,  U2>  V^,  V 2,  form  two  pairs 
of  variables  Wj  =  (Uj,  Vj)  and  W2  =  (U2>  V2^  *  We  say 

1)  Wj  is  more  PQD  than  Wj  if  for  all  u,  v, 

PrflJjSu,  VjSv)  -PrCUjS^PrtVjSv)  ;>  (2.7) 

Pr(U2  su,  V2  s  v)  -  Pr(U2  5  u)Pr(V2  5  v) . 

2)  Wj  is  more  associated  than  if 


Cov  {rqvj),  Afwpl-Cov  (r(w2),  a(w2)};>o, 
for  all  componentwise  increasing  functions  F,  A. 


(2.8) 
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(2.9) 


(2.10) 


(2.11) 


3)  Wj  is  more  LTD  than  K2  if 

Pr(Vj  £  v|u  £  u")  -  Pr(Vj  £  v|U  £  u)  > 

Pr(V2  £  v|u  <  u  )  -  Pr(V2  £  v|U  <  u) 
for  all  v,  u*  <  u. 

4)  Wj  is  more  RTI  than  hj)  if 

PrfV^  >  v|u  >  u)  -  Pr(V,  >  v  ju  >  u')  ^ 

Pr(V'2  >  v|u  >  u)  -  Pr(V2  >  v|U  >  u') 
for  all  v ,  u' <  u. 

5)  Wj  is  more  SI  than  if 

Pr(V1  >  v|u  =  u)  -  Pr(VL  >  v|u  =u")  ;> 

Pr(V2  >  vju  =  u)  -  Pr(V2  >  vju  =  u') 
for  all  v,  u' < u. 

With  this  definition,  comparisons  in  the  censored  model  can  be  made. 


Theorem  2.3.  In  the  censored  model  the  amount  of  positive  quadrant  depend¬ 
ence  increases  as  censoring  decreases  stochastically.  That  is,  if  Y1S^Y2  and 
Z^  =  min(X,  Y^) ,  i  =  l,  2,  then  (X,  Z 2)  is  more  PQD  than  (X,  Z^) . 

Proof :  Consider  Pr(X  <  x,  Z^  <  z)  -  Pr(X  £  x)Pr(Z  £  z)  .  There  are  two  cases. 

1)  If  x  <  z ,  then 

Pr(X  £  x,  Z^  <  z)  -  Pr(X  £  x)^r(Z.  £  z) 

=  Pr(X  £  x)  -  Pr(X £  x)Pr(Z.  £  z)  =  P(x){l  -  K.(z)} 

=  P(x)K.(z)  =  P(x)P(z)Q.(r) . 

where  K. (z)  =  P(z)Q .( z) ,  the  survival  function  of  Z. . 

11  1 


2)  If  x  >  z,  then 

Pr  (X  sx,  Z.sz)-  Pr(X  s  x)Pr(Zi  £  z) 

=  Pr{X  s  x,  minCX,  Yp  5  z}  -  Pr(X  <:  x)Pr(Zi  £  z) 

=  Pr(X  sz)+  Pr(z  sXix,  Y.  s  z)  -  Pr(X  s  x)Pr(Z^  s  z) 

=  P(z)  ♦{P(x)  -P(z)}  Qi(z)  -P(x)  {l-PCz)Q.(z)> 

(P(z)  -P(x)  +  P(x)P(z)}  =  Q.(z)P(z)P(x).  || 

With  Theorem  2.3  it  is  easy  to  construct  a  class  of  measures  for  which  (1.1) 
and  (1.2)  hold  by  taking  averages  of  increasing  functions  of  these  positive 
quadrants.  The  following  theorem  is  an  easy  consequence  of  Theorem  2.3. 

Theorem  2.4.  For  any  increasing  function  Ji(Pr(X£x,  Ziz)  - 
Pr(X 5  x)Pr(Z s  z) }dxdz  will  increase  as  censoring  decreases  stochastically. 

Corollary  2.5.  Cov  (X,  Z)  increases  as  censoring  decreases  stochastically. 

Proof :  Cov  (X,  Z)  = JJ{Pr(X s  x,  Z  £  z)  -  Pr(X £  x)Pr(Z £  z))dxdz  and  so  the  result 
is  immediate  from  Theorem  2.4.  || 

Covariance  is,  of  course,  a  well  known  measure  of  positive  dependence. 

Many  other  such  measures  can  also  be  shown  to  increase  as  censoring  decreases 
stochastically.  To  show  this,  we  state  the  following  theorem. 

Theorem  2.6.  Let  (IT,  vf^)t  i  =  1 ,  ...,  n,  be  independent  and  identically 

(2) 

distributed.  Let  (IK,  V>  J ),  i  =  l,  ...»  n  be  independent  and  identically  dis¬ 
tributed  with  (IK,  vf1^)  more  PQD  than  (IK,  vj2^),  i  =  l,  ...,  n.  Let  r,  s  be 
concordant  functions,  that  is,  both  r  and  s  monotonic  in  the  same  direction  in 
each  argument.  Then  {r(U^,  ...,  Un) ,  s(vj^,  ...»  V^^))  is  more  PQD  than 
(r(U.,  ....  U  ) ,  s(V.(2),  ....  V^2))}. 
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The  proof  is  by  induction  along  the  lines  of  Theorems  1  and  2  of  Lehmann 
(1966)  . 

Corollary  2.7.  Kendall's  t,  Spearman's  p  ,  and  Blonqvist's  q  all  increase 
as  censoring  decreases  stochastically. 

Proof:  Kendall's  x  = Cov(sign(X2  -  Xj) ,  sign(Z2  -  Z^)  and  hence  is  increasing  by 
Theorem  2.6  and  Corollary  2.5.  Spearman's  pg  =  3Cov(sign(X2  -  X^) ,  sign(Zj-Zj)) 
and  is  increasing  oy  Theorem  2.6  and  Corollary  2.5.  Blomqvist's  q= 

2{Pr(X  >m  ,  Z  >  m  )  ♦  Pr(X  5  m  ,  Zsm  )}  •  1  where  ra  and  m  are  the  medians  of  X 

X  Z  X  Z  X  z 

and  Z  respectively.  This  reduces  to  2{Pr(X > m  ,  Z>m  )  -  Pr(X  >  in  )Pr(Z  >  m  )  ♦ 

X  z  x  z 

Pr(X£mx,  Z  s  mz)  -  Pr(X  i  mx)Pr(Z  s  mz) } ,  which  (from  Theorem  2.3)  increases  as 
censoring  decreases  stochastically.  || 

So  the  simple  notion  of  positive  quadrant  dependence  has  yielded  a  large 
class  of  measures  which  can  be  used  in  the  censored  model.  It  is  reassuring  to 
note  that  these  include  some  of  the  well-known  measures  of  dependence.  The  next 
notion  in  the  chain  of  (2.6)  is  association. 

In  Example  2.8  we  show  that  even  though  there  is  less  censoring,  association 
nay  decrease.  This  is  ccunter  to  the  theme  of  (1.1)  and  so  association  is  inap¬ 
propriate  as  a  measure  of  information  in  the  censored  model . 

Example  2.8.  Let  r(X,  Z . )  =  I  (X  >  x,,  Z .  >  zj,  A(X,  Z.)  *  I(X>  x„,  Z  >  z„)  , 

1  1  JL  1  1  Z 

i  =  1,  2,  and  let  x^<  x^<  z^.  Then  Cov  {T(X,  Z^) ,  A(X,  Z^)}  =  P(z?)Q^(z2)  - 

=P(z2)Qi(z2){l-P(21)Qi(z1)>.  Choose  P,  Q2  so  that 
P(Zj)  =  1/2,  Qx(Zl)  =  1,  Q2(.j)  =  1/2,  P(z2)  =  1/4,  Q1(z2)  *-  5/12,  Q2(z2)  =  1/3.  Note 
that  Q1(zi)  a  Q2(zi)  ,  i  =  l,  2.  Then  Cov  {T(X,  Z^,  A(X,  Z1)}  =  5/96,  and 
Cov  (T(X,  Z2)  A(X,  Z2)}=6/96. 

Thus  a  chain  of  implications  similar  to  (2.6)  using  (2.7)  -  (2.11)  is  not 


possible.  This  leaves  the  last  three  notions:  LTD,  RTI ,  and  SI. 


Theorem  2.9.  If  Y  S^Y2,  then 

(i)  (X,  Z2)  is  more  RTI  than  (X,  , 

(ii)  (X,  Z2)  is  more  LTD  than  (X,  Z^),  and 

(iii)  (X,  Z2)  is  more  SI  than  (X,  Z^)  . 

Proof:  i)  Let  x'  <  x.  Then 
Pr(Z  >  z  1 X  >  x)  -  Pr(Z  >  z|X  >  x') 

(2.12 

=  (Pr(X  >  z,  Y  >  z,  X  >  x)/Pr(X  >  x) }  -  (Pr(X  >  z,  Y>z,  X  >  x')/Pr(X  >  x') } . 
There  ere  three  cases  to  consider. 

1)  Let  x  >  x'  >  z.  Then  (2.12)  reduces  to  Pr(Y  >  z)  -  Pr(Y  >  z)  -  0. 

2)  Let  x  >  z  >  x'.  Then  (2.12)  reduces  to  Pr(Y  >  z)  -  (Pr(X  >  z,  Y  >  z)/Pr(X  >  x')} 

Q(z) [1  -  (P(z)/P(x') }] .  This  decreases  as  Q(x)  decreases. 

3)  Let  z>x>x'.  The  (2.12)  reduces  to  P(z)Q(z) [ { 1/P(x) }  -  { 1/P(x') )]  = 
P(z)Q(z){P(x)  •  P(x')}-1(P(x')  -  P(x)},  which  decreases  as  Q  decreases. 

The  proofs  for  LTD  and  SI  follow  in  an  analogous  fashion.  |j 

Now  as  in  the  positive  quadrant  dependence  case,  classes  of  measures  of 
information  can  be  generated  with  Theorem  2.9. 

Theorem  2.10.  Let  ip  be  an  increasing  function.  Then 

(1)  Jz/x<x^(Pr(Z  s  zjx  ^  x')  -  Pr(Z  5  zj X  s  x) }  dxdx'dz  is  increasing  as  censoring 
decreases  stochastically, 

(2)  /z/x<x-’»'(Pr(Z  >  z|x  >  x)  -  Pr(Z  >  z|x  >  x') }  dxdx'dz  is  increasing  as  censoring 


decreases  stochastically. 


(3)  /z/x<x^(Pr(Z  >  z|X  =  x)  -  Pr(Z  >  z|X  »  x')X  dxdx'dz  is  increasing  as  censoring 
decreases  stochastically. 


3.  Coefficients  of  Divergence. 

When  X  <  Y  we  have  Z=X.  Since  the  variables  X,  Z  are  often  equal,  in  some 
sense  their  underlying  probabilistic  structures  should  be  similar.  From  Kullback 
(1959),  coefficients  which  increase  as  two  distributions  become  less  similar  are 
called  coefficients  of  divergence. 

Csiszir  (1963,  1966)  generalized  the  Kullback-Leibler  information  number 
in  the  following  fashion.  Let  f(x)  be  a  convex  function  on  R*  satisfying  (1.3). 

Let  Uj  and  u2  be  two  probability  distributions  on  some  measurable  space 
(X,  A).  Let  X  be  a  measure  on  (X,  A)  such  that  ^  is  absolutely  continuous  with 
respect  to  X,  i  =  1,  2.  Let  p.^  be  the  Radon-Nikodym  derivative  of  u^  with  respect 
to  X.  Define 


Vul’  u25  =  /pi(x)  f 


P?(x) 


Pi  00 


X(dx) 


(3.1) 


If(Uj,  u2)  is  the  f- divergence  of  Uj  and  u2. 

From  a  completely  different  point  of  view,  Ali  and  Silvey  (1965a,  1965b, 
1966)  and  independently  Ziv  and  Zakai  (1973)  obtain  an  expression  similar  to 
(3.1).  Both  pairs  of  authors  consider  cof^'c \ev\h,  which  measure  the  distance 
between  two  probability  measures.  Ali  and  Silvey  postulate  four  properties  which 
they  believe  the cofi(u  d(P^,  ?2)  should  satisfy: 

1)  d(Pj,  P2)  should  be  defined  for  all  measures  P^  and  P2  in  the 


sample  space. 


2)  dfPj,  P2)  adCPjt"1,  P^'1)  for  al*  measurable  transformations 
y  =  t(x). 


3)  dCPj,  P^  £d(Pj,  P2)  for  all  p2,  and  if  Pj  is  singular  with  respect 
to  P2,  d(Pj,  P2)  2  d(Pj ,  Pj)  for  all  Pj. 


4)  Let  (PQ;  6e(a,  b)}  be  a  family  of  distributions  with  densities 


pg(x)  having  monotone  likelihood  ratio  in  x.  Then  if  e1<e2<63’ 


d(P  ,  P  )Sd{?  ,  P  ). 

61  ®2  ei  63 


With  these  four  postulates  they  define  the  coefficients  of  divergence  as: 

df(pi  PJ  =E*[f(40]  =/6<a)fU)dr  4P  (N)  limf(*)/*,  (3.2) 

dP 

where  f(x)  is  a  convex  function,  <J>  »  ,  and  N  is  a  Pj-null  set  where  p2  has 

positive  measure.  The  only  difference  between  (3.1)  and  (3.2)  is  the  dominating 
measure  X.  The  two  measures  will  bo  identical  if  Pj  and  P2  are  mutually  abso¬ 
lutely  continuous.  Note  that  the  measures  (3.1)  and  (3.2)  are  not  symmetric  in 
Pj  and  p2.  However  if  g(x)  =  xf(^)  then  If(Pj,  P2)  =  Ig(P2*  Pj)  •  further  g  is 
convex  if  and  only  if  f  is  convex.  Define  a  new  function  f*(x)  =f(x)  +g(x);  then 
the  measure  If#(p1#  p2)  will  be  symmetric. 

With  the  criteria  (3.1)  and  (3.2),  measures  of  information  in  the  censored 
model  can  be  generated  by  carefully  choosing  P^^  and  in  terms  of  X  and  Z.  Note 
that  Pj  and  P^  need  not  be  probability  measures.  It  is  enough  that  both  be  inte- 
grable  functions  and  the  dominating  measure  be  sigma-finite.  Then  the  following 
can  be  used  for  information  in  the  censored  models. 


Definition  3.1.  Let  X  and  Y  have  support  on  the  positive  real  line.  Then 


the  information  in  the  censored  model  is  defined  as  in  (3.1)  where 


12  - 


Pj(x)  =Pr(X>x),  the  survival  function  of  X  and  p2(x)  =  Pr(Z>x),  the  survival 
function  of  Z. 


Theorem  3.2.  With  p^  and  p2  defined  as  in  Definition  3.1,  I^p^,  P2) 
increases  as  censoring  decreases  stochastically. 

Proof :  Property  (4)  of  Ali  and  Silvey  (1966)  will  be  used.  We  establish  a  par¬ 
tial  ordering  by  saying  <  (*2  if  Yq  ^  Yq  .  The  minimum  for  a  corresponds  to 
the  uncensored  case .  Let  x  <  x^ ,  and  note 


P(Xj) 


P(x2) 


Pfx^QCXj)  P(x2)Q(x2) 


=  P(Xl)P(x2)  (Q(x2)  -Q(Xl)}, 


which  is  ncgativo  for  all  x2<x^.  Thus  the  monotone  likelihood  ratio  property 
holds.  II 


With  this  definition  both  (1.1)  and  (1.2)  follow.  At  first  it  would  seem 
that  the  more  natural  choice  for  pj  and  p2  in  (3.1)  would  be  the  density  functions 
of  X  and  Z.  Since  X  =  Z  when  X  <  Y  it  seems  reasonable  to  postulate  that  the  den¬ 
sity  of  Z  should  approach  that  of  X  as  censoring  decreases.  The  following  example 
shows  that  this  need  not  be  the  case. 


Example  3.3.  Let  X  be  defined  on  the  points  x^ ,  x2>  x^,  x^,  x,.,  x^,  x^  with 
Pr(X  =  x^)  =  1/12  for  i  =  l,  2,  3,  5,  6,  7,  and  Pr(X  ■»  x^)  =  1/2.  Define  three 

censoring  variables  Yj,  Y2>  Y^,  also  with  support  on  (Xj,  x2,  xJf  x^,  x&,  x&,  x7>, 

satisfying: 

Pr(Y1  =  Xi)  =  1/12,  i  =  l,  2,  4,  5,  6,  7  Pr^  =  x3)  =  1/2, 

Pr(Y2  =  x.)  =  1/12,  i  =  l,  2,  3,  5,  6,  7  Pr(Y2  =  x4)  =  1/2, 

Pr(Y3»x.)  =  1/12,  i  =  1,  2,  3,  4,  6,  7  Pr(Y3  =  x$)  =  1/2. 


Note  that  Y^  s/ Y2  S:^  Y3.  Let  Y^  be  independent  of  X,  i*l,  2,  3,  and  let  Z ^ 

be  the  censored  variable  associated  with  Y. .  It  should  be  true  that 

1 

If(  X,  Zj)  2;  If(X,  Z2)  2;  If(X,  Z^) ,  so  that  the  least  censored  variable  is  the 
least  divergent  from  X  and  the  most  censored  variable  is  the  most  divergent. 

We  need  to  compute  I^X,  Z^) .  Note  that  X  and  are  mutually  absolutely 
continuous,  so  that  X(x)  in  (3.1)  is  the  counting  measure.  Also,  if  X  =  Y,  we 
adopt  the  convention  that  a  death  has  been  observed.  Then  with  this  convention, 
direct  calculations  show  that  the  vectors  of  probabilities  (Pr^  =  x  ) ,  .... 

Pr(Zi  =  xp}  are  for  i  =  l,  2,  3  respectively,  (23/144,  21/144,  64/144,  27/144, 

5/144,  3/144,  1/144),  (23/144,  21/144,  19/144,  1/2,  5/144,  3/144,  1/144),  (23/144, 
21/144,  19/144,  57/144,  20/144,  3/144,  1/144).  Then  direct  substitution  into 


(3.1)  yields 


If(X,  Zj)  =  (l/12)f(23/12)  ♦  (l/12)f (21/12)  ♦  (l/12)f (16/3)  ♦  (l/2)f(3/8) 

♦  (1/12) f (5/12)  (1/12) f (3/ 12)  ♦  (l/12)f (1/12) 

If (X,  Z2)  =  (l/12)f (23/12)  ♦  (l/12)f (21/12)  ♦  (l/12)f (19/12)  +  (l/2)f(l) 

♦  (l/12)f (5/12)  +  (l/12)f (3/12)  ♦  (l/12)f (1/12) , 

If(X,  Z3)  =  (1/12) f (23/12)  ♦  (l/12)f (21/12)  *  (1/ 12) f (19/12)  ♦  (l/2)f (57/72) 

♦  (1/12) f (20/12)  +  (1/12) f (3/12)  ♦  (l/12)f (1/12) . 


Thus, 


If (X,  Z1)-If(X,  Z2)=(l/12)f(16/3Ml/2)f(3/8Hl/12)f (19/12)- (l/2)f(l), 
If(X,  Z2)  -  If (X,  Z3>(l/2)f(l>(l/12)f(5/12Hl/2)f(57/72Hl/12)f(20/12) 


Take  f (x)  =  .  Then 


If(X,  Zx)  -  If(X,  Z2)  -1.73S0. 
If(X,  Z2)  -  If(X,  Z  )  =  -.0304  SO. 


The  above  inequality  reverses  the  expected  order.  Why  does  this  happen? 

Note  that  X  has  a  large  mode  at  the  point  x^.  In  the  censoring  variables,  Y^, 

the  mode  moves  from  x,  to  x,  to  xr.  As  the  mode  of  Y.  moves  toward  the  mode  of 

3  4  5  i 

X,  Zi  resembles  X  more  and  more.  When  the  mode  of  Y^  reaches  that  of  X,  Z^  will 
resemble  X  closely;  Z^^  will  be  unimodal  at  the  same  point  as  X.  Now  as  the  mode 
of  Y^  continues  moving  to  the  right,  Z^  no  longer  has  such  large  probability  at 
x^.  If  the  mode  of  Y^  continues  to  the  right,  eventually  Z^  becomes  bimodal  and 
thus  Z^  appears  to  be  different  from  X  by  a  substantially  greater  amount  than 
when  the  modes  are  equal . 

Thus  Example  3.3  shows  that  with  the  choice  Pj  the  density  of  X,  p2  the 
density  of  Z,  criterion  (1.1)  fails  to  hold.  It  is  true  however  that  for  this 
choice  of  Pj,  p2  the  measure  does  satisfy  (1.2).  This  follows  immediately  from 
Lemma  1.1  of  Csisz&r  (1967)  or  from  Property  3  of  Ali  and  Silvey  (1966). 

In  order  to  develop  a  more  satisfactory  measure  we  consider  the  vector 
Z_  =  (Z,  6)  and  return  to  the  concept  of  dependence. 

Consider  X  and  Z_  as  the  two  variables  of  interest;  then  the  f-divergence  of 
the  Radon-Nikodym  derivative  of  the  joint  distribution  of  X  and  Z  with  respect  to 
the  product  of  their  marginals  is  the  information  measure. 

Note  that  the  joint  density  of  X  and  Z_  puts  positive  probability  on  the  line 
where  X  =  Z,  the  45°  line  passing  through  the  origin.  This  line  has  zero  two- 
dimensional  Lebesgue  measure.  Thus  p^^  and  p2  defined  as  the  joint  distribution 
of  X  and  Z^  and  the  product  of  the  marginals  are  not  mutually  absolutely  contin¬ 
uous.  Hence  the  measures  in  (3.1)  and  (3.2)  are  no  longer  equivalent.  Equation 

(3.2)  is  now  useful  only  if  limf(x)/x  is  finite.  Equation  (3.1)  requires  a 

x  -*•“ 

measure  A(x)  which  dominates  both  the  joint  density  of  X  and  Z_  and  the  product  of 
the  marginals.  Let  A(x)  be  the  sum  of  two-dimensional  Lebesgue  measure  and  a 
measure  u,  which  is  Lebesgue  measure  on  the  45°  line,  {(x,  y) :  x  =  y,  x>0,  y>o). 


For  the  joint  probability  measure  of  (X,  Z)  ,  we  write  Pr{X  =  x,  Z_  =  (2,  0)}  = 
p(x)q(z),  for  x>z,  0  otherwise,  and  Pr{X  =  x,  Z  =  (z,  1)} = p(x)Q(x) ,  for  x  =  z, 

0  otherwise.  Then  (5.1)  becomes 

b'Pxi’r  W  *  ^ptx)p(x)5(x)f{5oopSffer} dx  * 

/J z<xp 00 q( P ( Z) }  d2dx- 

which  reduces  to, 

J^p(x)p(x)Q(x)f{l/p(x)}dx>/“q(x)P(x)P(x)f{l/P(x)}dx.  (3.3) 

Take  g(x)  =  xf(l/x) ;  then  (3.3)  becomes 

VpXpZ'  pXxZ>  "  Jop  WQ(x)g{P<x)  >d*  ♦  /Jq(x)P(x)g{P(x)  }dx.  (3.4) 

The  expression  in  (3.-)  can  be  viewed  as  a  loss  function.  In  this  case  the 
amount  g(p(x)>  is  lost  when  X  =  x  is  observed.  If  a  censored  observation  is 
observed  at  time  x,  the  loss  is  g(P(x)).  Now  if  censoring  increases  stochastic¬ 
ally,  losses  g(P(x))  occur  more  frequently  while  losses  g(p(x))  occur  less  fre¬ 
quently.  The  original  premise  was  that  increased  censoring  leads  to  decreased 
dependence  so  that  the  joint  distribution  should  be  closer  to  the  product  of  the 
marginals.  Thus  the  f-divergence  should  decrease  as  censoring  increases.  Thus 
if  g(P(x))  5 g(p(x))  the  monotone  criterion  in  (1.1)  is  satisfied. 

Equation  (3.4)  can  be  rewritten  as 

VPXPZ’  PX*Z^  =  l / Qp (x) g(p(x)}dx  +  P(z)g{P(z) }]dz .  (3.5) 

This  measure  is  equivalent  to  a  measure  of  information  in  the  discrete  case 
developed  in  Hollander,  Proschan,  and  Sconing  (1985).  Now  if  censoring  increases 
stochastically  (3.5)  should  decrease.  This  is  equivalent  to  the  term 


Hz)  <=  J*p(x)g(p(x))dx ♦  P(z)g(P(z))  being  increasing.  Assume  g  is  v 

'K(z)  =  p(z)g(p(z)}  -  p(z)P(z)g'{P(z)}  -p(z)g(P(z)}, 

which  is  positive  if  and  only  if  for  every  z 

g{p(z)>  s  P(z)g'(P(z)>  ♦  g(P(z)}.  (3.6) 

Unfortunately  inequality  (3.6)  is  not  always  satisfied.  For  example,  take 
g(x)  ® -logx  and  P(x)  =exp(-Xx);  then  the  direction  of  the  inequality  depends  on  X. 

However  some  conditions  can  be  found  for  g(x)  and  p(x)  so  that  (3.6)  is  satisfied. 

Two  such  conditions  are: 

Cl:  g  decreasing  on  [0,  1]  and  p(z){P(z)}'1  s2 

C2:  g  increasing  on  {0,  1]  and  p(z){P(z)}_1 £  2 

Theorem  3.4.  If  either  Cl  or  C2  hold  and  g'(x)  is  continuous  on  [0,  ~] ,  then 
VPxV-W  is  decreasin8  as  censoring  increases  stochastically. 

Proof :  It  is  enough  to  show  (3.6).  Expand  g(p(z))  in  a  Taylor  series  about 
P(z).  Then 

g(p(z) ) *  g{P(z)}  ♦  g'(P(z)Hp(z)  -  P(z)> 

ag{P(z)}  ♦  P(z)g'{P(z)}  ♦  g'(P(z)Hp(z)  -  2P(z)} 
*g(P(z)}*P(z)g'{P(z)}, 

if  g'(P(z) )(p(z)  -  2F(z)}  so,  which  holds  if  Cl  or  C2  hold.  || 

In  tenns  of  the  original  function  f(x),  g(x)  decreasing  is  equivalent  to 
f(x)/x  increasing,  lsx<».  Most  of  the  functions  f(x)  which  are  commonly  used 
in  f-divergence  satisfy  the  necessary  condition. 


Example  3.S. 

1)  f(x)  *  xlogx 

2)  f(x)  -  (1/2)  (x*5-  l)2 

3)  f(x)  =  (1/2) ! x  -  l| 

4)  f(x)  «  (x  -  l)2 


g(x)  = -logx 

g(x)  =  (1/2) -  l)2 
g(x)  -  (1/2) | x  -  1 1 
g(x)  *  (x-  l)2/x 


Kullback-Leibler 
Information  number 

Hel linger  metric 

city-block  distance 

-distance 


It  is  easy  to  verify  that  in  the  above  four  cases,  g(x)  is  decreasing.  Note 
that  the  third  function  does  not  satisfy  the  conditions  of  Theorem  3.4.  However 
the  ordering  still  holds  under  slightly  more  restrictive  conditions. 

Theorem  3.6.  If  g  is  decreasing  on  (0,  1)  and  p(z)(P(z))  1  £  1,  then 
I^(pxP2,  PXl<z)  is  decreasing  as  censoring  increases  stochastically. 

Proof:  If  g  is  decreasing  and  p(z)/P(z)  Si,  g(p(z) } 2  g{P(z) ) .  Equation  (3.6) 
follows  since  g"(x)£0  on  (0,  1).  || 

These  last  two  theorems  use  the  divergence  measure  as  defined  in  (3.1).  As 

was  stated  previously  (3.2)  is  not  satisfactory  unless  limf(x)/x<®.  Of  the 

x-*-® 

four  functions  cited  in  Example  3.5  only  the  second  and  third  functions  fit  this 
criterion.  In  particular  the  third  function,  f(x)  *  (1/2) | x  -  l|  is  the  one  orig¬ 
inally  proposed  by  Ali  and  Si Ivey  (1965a)  for  measuring  dispersion  between  the 
joint  distribution  of  two  variables  and  the  product  of  their  marginals.  In  the 
censored  model,  the  set  N  corresponds  to  the  set  where  X  =  Z,  or  equivalently,  when 
XSY.  Then  (3.2)  becomes 

df(pXxZ'  =  J^q(x)P2(x)f{l/P(x)}dx  +  cJ“p(x)Q(x)dx  (3.7) 


where  c=  limf(x)/x. 


Theorem  3.7.  If  f  is  such  that  lim  f  (x)/x  =  c  <  ®  and  f(x)/x  is  increasing 

x  -*■  “> 


for  l<x<®,  then  df(PXxZ»  PXPZ) 


increases  as  censoring  decreases  stochastically. 


Proof :  Consider  (3.7)  as  an  expected  loss  over  the  variable  Z  with  loss 
P(x)f{l/P(x)  }  when  Z«x  and  Y<X,  and  loss  c  when  X^Y.  So  the  loss  function  can 
be  written  as  P(x)f{l/P(x)}I(Y  < X)  ♦  cI(X  S Y) .  As  Y  increases  stochastically, 
so  does  Z.  Since  f(x)/x  increases  to  c  as  x  increases,  the  loss  function  is 
increasing.  Hence  the  expected  loss  increases.  |j 

In  Example  3.5  both  the  Hellinger  metric  and  the  city-block  distance  satisfy 
the  conditions  of  Theorem  3.7.  The  conditions  in  Theorem  3.7  are  less  restric¬ 
tive  than  those  of  Theorem  3.4  in  the  sense  that  there  is  no  condition  on  the 
distribution  of  X.  of  course  the  conditions  in  Theorem  3.7  are  more  restrictive 
in  the  sense  that  they  allow  far  fewer  functions  f. 
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